COG2[AM,DBL] - www.SailDart.org

perm filename COG2[AM,DBL] blob sn#472390 filedate 1979-09-13 generic text, type T, neo UTF8
\ASEC{Cognitive Economy in Existing Programs}

To date, most programs, including our own 
AI behemoths,  have {\it not} had to cope
with runtime environments that changed significantly.  In Equation 1, the
costs for monitoring and building in self-modification abilities have been
prohibitive, and the need for adaptiveness has been minimal.
The environments have been {\it complex} but not {\it changing}.  To be
``fit'' in such worlds, programs have had to employ sophisticated architectures,
knowledge bases, control strategies, representations, and expressive languages
in which they are specified; but programs have {\it not} needed to reorganize
themselves dynamically, to adaptively alter those sophisticated design
components.  Even in those situations in which
this flexibility must be required in the long run, it has been avoided, because 
(i) self-modification capabilities are tough to build in; many programs have
simple self-reprogramming as their sole task,
and 
(ii) monitoring is an expensive overhead in both space
(detailed record-keeping) and time (finding regularities in such histories).
This raises the issue of efficiency:

\ASSEC{Efficiency}

When we build a new, large AI program, 
we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral.  
We want to be able to modify them later with little or no redesign or 
reprogramming manually.
On the other hand,
we want our systems to run as efficiently as possible, to mi\-ni\-mize the
its demands for cpu time and primary and secondary memory.
We are torn between
{\it expressi\-bility} and {\it effi\-ciency}.


Many AI  researchers {\sl  cum} language  designers have  focused on  {\it 
expressibility},  the   problem   of  rapidly   constructing   a   working
experimental vehicle.\foo{This objective is apparent in the goals of 
{\:c EMYCIN} [Feigenbaum 1977], 
{\:c KLONE} [Brachman 1978],
{\:c KRL} [Bobrow and Winograd 1977],
{\:c PSI} [Green 1977], 
{\:c RITA} [Anderson and Gillogly 1976], 
{\:c ROSIE} [Waterman et al. 1979],
and
{\:c SAFE} [Balzer et al. 1977].}
They have produced some superlative software such  as
Interlisp's {\:c DWIM}, File, and Break  packages.
Four fundamental  techniques
utilized in highly {\it expressive} programs are:
({\it i}) reliance upon a very-high-level language,
({\it ii}) planning and reasoning at multiple levels of abstraction,
({\it iii}) inference by pattern-directed knowledge sources, and
({\it iv}) mi\-ni\-mal, nonredundant representation, as in a canonical
gene\-rali\-zation/spe\-ciali\-za\-tion hierarchy.

Both within the body of this paper (the Caching section) and within these
appendices (the Expectation-Filtering and Levels of Abstraction appendices)
we present various
techniques which do not sacrifice expressibility, yet enable
programs to (semi-)automatically improve themselves and thus  increase
their productivity.
One point of this paper is to show that the second goal of AI programming,
short-term runtime efficiency, is not sacrificed by these techniques.

The basic source of power is the ability to predict the way that the program will
be used in the future, and to tailor the program to expedite such uses.\foo{This
approach is sometimes used {\it manually}, as when a program is quickly coded
in an expressive but inefficient form, used heavily, and then recoded in a
much more optimized form.  Thus, e.g., Ray Carhart spent 1977 
rewriting DENDRAL (streamlining its CONGEN module)  for a minicomputer.}


The traditional approach to program optimization has assumed that static analysis of
programmer's expectations (whether implicit in his code or explicit in his
formal assertions) can identify the significant
optimization opportunities.  Three types of methods for analyzing program
descriptions in this way include:
({\it i}) analyzing 
program flow and structure [Knuth 1968] [Dijkstra 1976],
({\it ii}) designing data structures to be appropriate,
and ({\it iii}) compiling (as in the {\:c FORTRAN H} compiler) and optimizing
transformations of the program (as in 
[Darlington and Burstall 1973 1977] or [Balzer et al. 1977].)

In a changeable environment, 
we advocate the use of methods more {\it dynamic} than these.  Rather than
improving the static description of a program, we propose
that the program modify its own structure to adapt it to its
operational environment.\foo{Some
earlier automatic programming systems [Darlington and Burstall 1973]
[Lenat 1975] [Low 1974] [Kant 1977] [Barstow 1977]
were designed to improve programs' efficiency, and many of their
ideas have found their way into the techniques we now describe.  
Those systems 
had program construction, transformation, and optimization as their primary
task; we are proposing general methods to provide 
self-improving abilities
to other kinds of AI programs
(understanders, theory formers, expert simulators, planners, etc.).}

One valuable source of prediction comes from the program's experiences as it runs.
If a program can ``learn" from its experiences, it might try applying each piece of
newly acquired knowledge to the task of specializing itself, modifying itself
to run more efficiently in the current runtime environment.
We believe that a program's ``intelligence" can be increased in this way;
that is, by increasing its ability to
acquire  appropriate knowledge, to organize that knowledge, and to refine
the conditions under which that knowledge is recognized to be applicable.
For any fixed quanta of manpower ({\it sic}) resources we expend, there is a
limit to the size/complexity of programs which can be successfully implemented.
This barrier might be overcome by programs which are self-extending, which actively
do things to enlarge their input domain, their capabilities.

\hbox par 2.9in{    % this makes a space for the little figure; see below.
Notice that representing a corpus of knowledge as a  mi\-ni\-mal
(canonical)      gen\-erali\-za\-tion/spe\-ciali\-za\-tion hierarchy, with
interpretatively-com\-puted in\-he\-ri\-tance, may or may {\it  not}
be cognitively economical:
this technique
fa\-vors ex\-pres\-si\-bi\-lity,  compaction of  re\-pre\-sentation,
and efficiency in updating and altering the knowledge base, but
at  the expense  of
performance. It is  true that  a  dog is a mammal,  and a  mammal is  an
animal, and from those two we could compute that a dog is an animal (see
Fig. 9), but
if the same arc is being repeatedly recomputed then
 it is more cognitively economical to store one redundant  arc.
Studies by ourselves and others
[Hayes-Roth and Hayes-Roth 1974] [Rips et al. 1975] indicate
just such redundancies being created and strengthened by repetitions.
}

% [[in space above, insert DIAGRAM OF DOG/MAMMAL/ANIMAL,
% WITH REDUNDANT ARC AS A !STRONG! DOTTED ARROW]]

Obviously, the economy of specific representations and related inference
methods depends heavily on underlying machine architectures and costs.  We
assume that intelligent systems aim chiefly to produce useful results
(e.g., novel hypotheses) as rapidly as possible.  In short, they should
search cleverly and fast.  Thus, {\sl a priori} notions of aesthetics or design
economy have little bearing on cognitive economy.  Massive redundancies
in storage, processors, and inference processes seem more directly
relevant.  Furthermore, we suppose that, like people, intelligent software
should eliminate slow, interpretive searches for previously solved
problems (especially if the same problems are repeated often).
To this end, cognitive economy may be achieved
through dynamic compilation of action sequences.  Such compiling would
reduce the time-cost to effect the corresponding behavior; on the other
hand, this efficiency would appear to be gained only
at the cost of interpretability,
interruptibility, and modifiability.  Those are desirable attributes iff
a program potentially needs continual changing.  
In typical situations, both efficiency
of compiled forms and accessibility to declarative forms are intermittently
needed.  These different needs point again to the economic benefits of
maintaining simultaneously alternative and redundant representations.

\ASSEC{When It Pays to Adapt}


We can abstractly  characterize the situations in which an AI program
would find it cognitively economical to monitor and adapt itself: the resources
available for specific tasks vary significantly; or the specific tasks have (at
least locally) some similarities, some structure which enables them to be solved
quickly in the same particular way.  Here are some examples:

The load average of a time-sharing system fluctuates with time of day, but
users demand that their speech understanding system always
provides the best guess it can in real time.  

A hospital simulation program can run at excruciating levels of detail,
and it lives on a dedicated machine. But
there's been a bad accident nearby and within one minute the police must have
an estimate of how many emergency cases the hospital can accept.

A program is given an enormous amount of resources to prepare for each task,
but little resources between the time a task is presented and an answer is
called for.  It may be cost effective for the program to spend some of its
preparation (inter-task) time trying to {\it discover} new algorithms,
representations, etc. We are struck by this in nature (evolution), though
as yet just glimpse it in AI programs (such as Eurisko).

A system is sold to several customers, each with a distinct but complex
task environment in which it will live.  It then becomes cost effective to
spend some resources streamlining the program (either manually or
automatically) to fit that user.  
In terms of equation 1, the environment changes only once, so no continuous
monitoring and modification is performed; the onetime cost of the modification
can be amortized throughout the lifetime of the program (i.e., the penalty
for non-specialization is integrated over time.)
The well-known SYSGEN generation of
tailored operating systems is a simple example of this.

While maintaining a file system, the program notices the patterns in each user's
accesses to his files, and of the community as a whole, and occasionally
selects a new data structure for some part of the system to make it run more
efficiently. See [Low 1976].


\ASEC{The  Assumptions}

{\bf Summary:\  \sl The claims we make elsewhere in
this paper presuppose that ``intelligence" can be
adequately modelled as heuristic search, guided by
pattern-directed rules, implementable on a uniprocessor.}

\yskip

Every scientific theory is  constructed in a  rich context of  surrounding
theories, methods,  and standards  determining which experiments  are
reasonable ones to perform and which point of view to take when  interpreting
the results  --- in short, a {\it paradigm}. We feel it useful
to articulate the ``core" of our paradigm (the assumptions our  theory
rests upon) before  delving into more detail.
For that reason, this section is devoted to a presentation of our
assumptions, including:
a model for intelligence
(Appendix 2.1),  a  model for  how  intelligent computer  programs  may  be
organized (Appendix 2.2),  and a  model of the  characteristics of  present
computing engines (Appendix 2.3).  Briefly:

(i)  We  continually  face  searches  in  more  or  less  immense  spaces;
intelligence is the ability to bring {\it appropriate} knowledge to  bear,
to speed  up  such  searching.   Increasing  intelligence  then  comprises
increasing  knowledge, improving its  organization,  and refining the
conditions  for  its
applicability.  How are  these new discoveries  made? Empirical  induction
(generalizing from experimental and other observations), analogy, and  the
criticism  of  existing   theory  all   lead  to   new  conjectures,   new
possibilities.

(ii) Intelligent systems  can make  the applicability  of their  knowledge
explicit by representing that knowledge as condition-action rules ({\:cIF {\sl
situation}  THEN  {\sl   appropriate  reaction}}).  Such   pattern-directed
inference systems (PDIS) can benefit from a schema representation  (frame,
unit, being, script, etc.),  because this adds  structure which the  rule
descriptions can exploit.

(iii) Current computing technology presents us  with limited cycles, cheap  storage,
and expensive knowledge acquisition.



% \vfill\eject\SSEC{A Model of Intelligence}
\ASSEC{A Model of Intelligence}

{\bf Summary:\  \sl Since ``intelligence" depends upon bringing relevant
knowledge to bear, it can be increased not only by adding new knowledge,
but also by better organizing the existing knowledge.}

\yskip

Many human cognitive activities, including most of those commonly referred
to as ``requiring intelligence," can be cast as searches, as  explorations
through  a  search  space,  meandering  toward  some  goal.   We  call   a
problem-solver ``more  intelligent''  if  it can work efficiently toward  a
solution even in the  face of an immense  search space and an  ill-defined
goal.  Somehow, it is  imposing more structure on  the problem, and  using
that to search more effectively. How might it do this?
According to our model of human intelligence, it accomplishes the task  by
bringing knowledge  to  bear, knowledge  not  supplied explicitly  in  the
problem statement.  This knowledge can be about problem-solving in general
(e.g., how to recognize and break cultural blocks [Adams 1974])
 or about the  problem
domain specifically (e.g., which groups of chemical
atoms can usually be treated as aggregate
superatoms [Feigenbaum 1977]).

This implies  that a  problem  solver can  become  more effective  by  (i)
increasing its knowledge, (ii) better organizing its knowledge, and  (iii)
refining its criteria for  deciding when various  pieces of knowledge  are
applicable.  In terms of production  (If-Then) rules, these correspond  to
adding new rules, modifying  the data structure in  which rules are  held,
and tuning the conditions (If parts) of existing rules.

One very basic assumption we are making is that of {\it continuity}:
the problem-solving environment does not change its character too
radically, too rapidly.  That is, our environments are {\it fluid} but
not gaseous.  Let's see why this assumption is required.
If the program abstracts from its experiences a plausible heuristic H, it may be
able to analyze past scenarios to verify that possessing and obeying H
would definitely have improved its performance (e.g., led to the current
state of knowledge quicker);
but its only justification for believing that H will help it in the {\it
future} is the empirical evidence for the continuity of the environment.
Learning can be translated effectively into action only when the structure of
the environment has not altered too greatly.

A related assumption is that of {\it irreversible knowledge accretion}.
Our body of facts and observations continually grows and only occasionally
shrinks.  Certainly the abandonment of a whole system of thought is a
rare event indeed (see [Kuhn 1972]), 
whereas the adoption of new systems of thought is part of
the standard educational process.


\hbox par 2.9in{    % this makes a space for the little figure; see below.
New knowledge is discovered by 
specializing some broadly applicable but weak principles, or by
generalizing
particular experiences. The latter includes abstraction
(condense some experiences into
a  heuristic which
would have greatly aided us in the  past if only we'd had it) and
analogy (draw parallels  to  other
existing facts and heuristics,  and also to the  {\sl paths} which led  to
their discovery.)
One sometimes ``spirals in'' [Lakatos 1976]
toward precision and competence (see Fig. 10).}

%(hypothesize, criticise, and
%improve the hypothesis) [Lakatos 1976].


   \ASSEC{A Model of Intelligent Program Organization}

{\bf Summary:\  \sl 
Since we want our AI programs to access, reason about, and expand their own knowledge
bases, it is useful to represent such knowledge in a clean, modular form
(e.g., employing any one of the current schematized representations.)}

\yskip

The preceding remarks  about intelligent  problem  solving apply  equally  to
``hardware" and ``wetware" alike.  To be intelligent, computer programs
must  ultimately  grapple  with   the  tasks  of  knowledge   acquisition,
representation, and refinement. We cannot provide an absolute answer to how
they should be organized, but we  can posit some design constraints  which
have proven useful so far.

A very  general heuristic  in AI  programming is  the following:  {\sl If  your
program is  going  to  modify its  own  {\bf  X}, then  make  {\bf  X}  as
separable, clean, and explicit  as possible.} In our  case, {\bf X} can  be
instantiated as  ``knowledge," or  as  ``applicability conditions  for  each
piece of knowledge."  In this case, the heuristic advises us to represent our
knowledge in a separate,  clean, explicit form,  say as knowledge  modules
having some fixed internal 
structure, and also advises us to keep the applicability
conditions for  each  knowledge  module  separate from  the  rest  of  the
knowledge it contains.

This naturally leads us to  a pattern-directed inference system, in  which
knowledge is broken into  separate modules, each with  an explicit set  of
relevancy tests [Waterman and Hayes-Roth 1978].  Such systems arising in
Pittsburgh may champion
syntactic purity, while  those arising  in
California may tend toward the baroque,
but such variations  should not obscure  their common source  of
power.  The PDIS architect breaks his program's knowledge into a set  of
condition-action  production rules.

Having a clean  representation for  rules means having  a clean,  precise,
elegant language in which to express them.  By structuring the  conceptual
knowledge of  the system,  by partitioning  each module's  knowledge  into
several categories, a rule can  condense an entire cumbersome  description
into  a  simple reference (often a single pointer).  The  popular  schematized
representations  of
knowledge (scripts for episodes, frames for static situations, beings  for
specialists, units for  everything) enable a  rule like ``If  there are  no
known methods specific to finding new examples of prime numbers,  then..."
to have its condition coded as a simple null-test on the ``To-get" subslot of
the ``Examples" slot of the schema called ``Prime Numbers."  
By a judicious  choice
of slot  types and  subslot  types, the  system  builder can  reduce  most
triggering conditions  to  such  quick  checks on  the  state  of  various
(subslots of) slots of schemata.

Additional knowledge can be
brought to bear to locate efficiently all the rules which
{\it might} have their conditions satisfied in a given situation, and also
to decide which rules to execute (obey) among those whose IF  parts
have triggered (been satisfied).

   \ASSEC{A Model of (Present) Computers}


{\bf Summary:\  \sl 
Since we  are  considering the  problem  of building  computer  models  of
intelligent  behavior,   many   of   our   assumptions   deal   with   the
characteristics of present-day computers. They are the symbol manipulators
we use, but were not designed for that general purpose.}


\yskip

The foremost quality of computing machines, as we know them today, is their
limited ``inferential bandwidth."  Although they provide virtually {\sl (sic)}
limitless storage capacity, these machines can process only a small fraction
of this information in a reasonable amount of time.  Two primary reasons
for this disparity between storage and generation capacities include:
(1) knowledge acquisition proceeds slowly and expensively; and (2) inference
processes depend on operations that ordinarily require cycles of
uniprocessor systems.  

The first problem places a {\it cultural} limitation on the
development of intelligent machines.  This development is restrained by our
inability to express what we would like our machines to do or to convert
these expressions into effective machine interpretable codes.  

The second
problem reflects a {\it technological} constriction of our potential.  The
so-called von Neumann bottleneck, caused by machine architectures that
sequentially interpret program steps through a single logical processor,
strongly motivates concern for programs that can mini\-mize unnecessary
computations.  

In some future environment, perhaps neither of these problems
will persist.  In the meantime, however, we assume that two of the best ways
to increase AI system productivity, derived from these current limitations,
will be to expedite knowledge acquisition and to avoid unnecessary computing
(for instance, by opting to ``store & recall" rather than ``forget & 
recompute").


\ASEC{Another Example of GET}

Here is another example of how Eurisko's expanded 
{\:c GET}
 works:
we consider an access on the 
{\:c PARENTS}
 slot, when that slot is not primitive,
but rather is defined as the union of slots for father and mother.

\parskip 1pt

{\:c (GET MERLE PARENTS)}

{\:c ((GET PARENTS DEFN) MERLE)}

{\:c (((GET DEFN DEFN) PARENTS) MERLE)}

{\:c (($\lambda$ (s) ((GET (GET s SLOT-COMBINER) DEFN) (GET s BUILT-FROM))) PARENTS) MERLE)}

{\:c (((GET (GET PARENTS SLOT-COMBINER) DEFN) (GET PARENTS BUILT-FROM)) MERLE)}

{\:c (((GET UNION DEFN) FATHER MOTHER) MERLE)}

{\:c ((($\lambda$ (s1 s2) ($\lambda$ (c) (LIST (GET c s1) (GET c s2)))) FATHER MOTHER)  MERLE)}

{\:c (($\lambda$ (c) (LIST (GET c FATHER) (GET c MOTHER)))  MERLE)}

{\:c (LIST (GET MERLE FATHER) (GET MERLE MOTHER))}

{\:c (LIST SID BETTY)}

{\:c (SID BETTY)}



\parskip 20pt plus 3pt minus 1pt \lineskip 8pt

\yskip

\ASEC{Dynamically Monitoring the Task Environment}


{\bf Summary:\  \sl
The body of this paper has amply
illustrated how a program might profit from knowledge
it gathered dynamically. This suggests that
rather than working on the given problem exclusively, 
a program may be better off to expend some
time {\it learning} (about the specific problem, its broader domain, or
problem-solving in
general).  Note this suggestion would encompass traditional education (being
told), empirical research (observing),
and theoretical exploration (predicting).
While such ``very high-level" problem-solving strategies typify human
problem-solvers (especially those of us who have
rationalized spending twenty or more years ``in school''), very few programs to date
have employed any of these forms of learning to improve their operation.}

\yskip


We discussed in Section 1 the ``environmental fluidity"
conditions under which it could be cost effective for
an intelligent system to be able to improve its
performance dynamically.  To accomplish this, a program
must first be able to monitor its processing, i.e.,
to sense, record, and analyze
its representations, data structures, algorithms, inference procedures, and the
way they are currently being relied upon.
It can then use this gathered knowledge to modify or redesign itself to
perform more efficiently.


Some programs (and wish-lists for program capabilities) have included learning
more about the problem being worked on;\foo{ McCarthy's advice-taker and
Balzer's dialogue system (to get airline reservation programs written) would
be told, Gelernter's theorem-prover (employing diagrams as analogic
models) could observe, Lenat's AM
could observe and predict.}
in this paper, however, we are stressing learning about the
ways the program is being currently employed.

{\bf Learning by Being Told}

There are many situations in which a user could provide advice to a program:

\yskip

{\it (a). } The user might want to convey to the program
some task-specific item: ``Integration by
parts won't work on this problem';' ``This is a very hard
problem.''   This can easily be input as part of the user's statement of the
problem for the system to work on; e.g., the user might designate some methods
as being preferred or constrained, he might estimate a resource allotment, etc.

{\it (b). } Another type of advice
will apply permanently to the nature of the program's functioning
(e.g., ``Answers, even if uncertain, must be produced in real time;''
``Hard problems will be
starred.'')  Such knowledge can be reflected permanently in
the design of the program, in its code.

{\it (c). }  But there is an intermediate level of advice, some information
the user knows will concern
the next several tasks to be submitted --- not just the present one and not
all future ones:
``The next four problems will be supplied in order of
expected difficulty;" ``Several of these problems share common assumptions
and raise related issues;" ``The user for the next entire
session will be a physician;" ``Until I find out why F34 returned NIL, I'll
be in debugging mode, not production mode."  Such advice is rarely communicable
even to AI programs.

We believe this type of intermediate-level
advice would improve
programs if they had the opportunity to accept and use it.
In general, we would like to advise the program which capabilities to
consider and which to ignore during a prolonged period of solving closely
related problems.
A fixed (static) approach, for example, would be to define a few modes
(e.g., Debug, Demo, Production, Input,...) and permanently build into the code
all changes required to specialize the system for each mode. A flexible
(dynamic) approach would
provide a larger language in which the user could describe his current
mode of usage of the program (e.g., by typing ``{\:cUSAGE of KNOWLEDGE-BASE
will-be INCREASED VERY-MUCH}'').
The program would translate such statements into
changes in its data structures, algorithms, explanations, core management
strategies, etc. In the current example, heavy usage of the knowledge
base might
cause the program to swap out its compiler, optimizers, and other code,
enabling it to hold many more knowledge chunks at a time in core.
The ability to hold more knowledge in memory would directly affect the rate
and probability of productive inferences.\foo{Similar effects have been shown
for humans: [B. Hayes-Roth 1979 {\it Cognitive Science}, in press]}

\yskip

{\bf Learning by Observing}

Spending a little time/space reconfiguring to reflect the current state
of the world (the nature of the recent and succeeding inputs) seems
worthwhile; the problem is how to know what that abnormal nature of the
environment is. In the last subsection, we discussed the case where the
user explicitly tells the program. But this is often impossible (as when
the
user is unaware of an impending irregularity, or when the ``user'' is
Nature), and almost
always a nuisance (learning the language in which to express such advice,
remembering to do it, taking the time to do it). 

The next step seems to be to allow the program to {\it infer} such facts
about its run-time environment from observations.
What kinds of self-monitoring can be usefully employed?

The method which comes to mind first, perhaps, is to save a complete
record of everything that ever happened to this program: all the inputs
it received (when, from whom), function calls and accesses it made internally,
paths it chose/rejected (and why),
and outputs it generated.  The program could in principle
induce, from such an exhaustive
history-list, patterns such as ``very light usage of
compiler;'' ``SQRT is always called
on the result of SUM-OF-SQUARES.''

The space costs of such an archive, and the temporal cost of the
procedures
to infer useful patterns, are surely prohibitive.
The more economical approach is to analyze in advance  (and change slowly
throughout the program's lifetime [Samuel 63]) certain
features we would wish to look for in a complete history record, and then
to have the program record just enough dynamically to gather and maintain
those features. Some of these will be domain-independent and
commonly useful:
Frequency of calls on various functions, requests for subtasks, etc.;
Patterns in the arguments to calls on certain functions and inputs to the program;
Patterns in the sequence of calls on functions;
Patterns in functions' results (e.g., F34 always seems to
return a User-Name).
Some of the features will be domain-dependent specializations of the
preceding features (``Average degree of input equation'' is a special
``pattern in the input.'')


{\bf Learning by Predicting}

The theoretician and the engineer share
a powerful technique: manipulating
a model of a system to generate predictions and identify those possible
outcomes that would be desirable.
This is a third route to learning.
How might programs produce it?
What sorts of theories, or models, can be employed to generate new
information?

{\it Models of the program's users} can be valuable. This includes
maintaining and extending profiles of individual users and whole
groups. When a user opens a dialogue with Eurisko with the word
``Let", it is a safe bet that he's a mathematician.  Later, when he
uses the word ``function," his membership in that group makes it easy
to determine which meaning he has in mind
(instead of, say, what a physiologist or a computer scientist or sex therapist
might mean
by ``function.'')
When the program wishes to
mention $\sqrt{-1}$, it uses its 
{\:c MATHEMATICIAN}
 model to
choose an appropriate notation (``{\sl i}'' for mathematicians
instead of, say, ``{\sl j}'' for
engineers).  Models can be arranged in a hierarchy, with inheritance
of features from more general groups to their subsets, a lattice
ultimately terminating in models of individuals.  User inputs would
be recognized by various models, which would then hypothesize membership
in that group (or explicate some future conditions which would confirm or
deny such inclusion).  Once triggered in this way, the user's apparent groups
(and its generalizations) would actively
filter his inputs and outputs, could modulate the program's actions (e.g., by
changing the worth of the ``Proof" concept, which in turn would change
the priority rating of tasks on an AM-like agenda and thus alter
the program's choice of what to work on next.)
New user models can be created either top-down (e.g., when a model has
many diverse instances or when it's proven historically to be very valuable,
then try to split
it into several specializations, each having a share of the general model's
instances) or bottom-up (i.e., notice that a group of individuals share a
set of common characteristics, and create a new user group out of them).
At our suggestion, a student at CMU has built a program embodying some of
these ideas [Rich 1979].

The preceding kind of prediction is based on a theory of user types, embodied
in a set of ``typical representative'' models.  The same source of power can
be used in many ways: stereotyping of events (as in scripts), stereotyping
of static complexes (as in frames), stereotyping of communities (as in
actors, beings, and contract nets).  In all these cases, the source of power
lies in quickly recognizing probable membership in (applicability of) some
schema, after which all the rest of the schema
(plus knowledge about all its generalizations) is presumed to be relevant.
Programs might build up and utilize scenarios of how problems are solved
typically in their domain.  The building-up could be top-down via specialization
or bottom-up via generalization, or even ``sideways" via analogy. The 
models would be used for prediction (recognize relevant models and activate
them, have them hypothesize probable future situations).
Together, the models would form a theory of how problems are
solved in that domain.